-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize haste map data structure for serialization/deserialization. #8171
Conversation
'fruits/__mocks__/Pear.js': ['', 32, 42, 0, ['Melon'], null], | ||
'vegetables/Melon.js': ['Melon', 32, 42, 0, [], null], | ||
'fruits/Banana.js': ['Banana', 32, 42, 0, 'Strawberry', null], | ||
'fruits/Pear.js': ['Pear', 32, 42, 0, 'Banana/Strawberry', null], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing \t delimiter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is crazy. I love it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woo! Keep'em coming 😀
(changelog 😬 )
These numbers sound great, thanks! I think we should use the colon ( |
Try e.g. require('fs').writeFileSync('/tmp/\t', 'asdf') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated above we should change the delimiter to :
like it is used in $PATH
etc., other than LGTM!
We talked about it, but just so it's written down - a |
Yeah, I'd recommend using @scotthovestadt did you measure the impact in building the haste map or the full startup time? I imagine this can have an impact in the time to determine the tests to run (see https://github.com/facebook/jest/blob/master/packages/jest-resolve-dependencies/src/index.ts) as it resolves the dependencies of all the modules in the haste map. Maybe it'd be worth it to memoize the array (even in the same object, as it won't be persisted again). |
@jeysal Will extract the delim to a var and use Considered memoizing but don't think it'll be worth it with the current access patterns. I did test the haste map generation and while there might be a small hit it wasn't actually measurable because:
In any case, it can be optimized away by doing the join in the worker thread (which would introduce the complexity of detecting which type to avoid a breaking change). I didn't go there yet because I wasn't able to measure a difference by doing that... only pushing up changes where I can measure a significant difference. :) |
@scotthovestadt my point is that if you only measure haste map build time you won't see that the cost might be transferring to a later stage in the process (in this case test determination). I'd try to measure the time it takes to get the list of tests that need to run when you use |
Codecov Report
@@ Coverage Diff @@
## master #8171 +/- ##
==========================================
- Coverage 62.28% 62.28% -0.01%
==========================================
Files 265 265
Lines 10440 10435 -5
Branches 2538 2535 -3
==========================================
- Hits 6503 6499 -4
- Misses 3352 3354 +2
+ Partials 585 582 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thanks @scotthovestadt!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Summary
This simple PR is the result of a ridiculous number of attempts to specifically optimize the serialization and deserialization of the haste map followed by a simple realization: creating arrays is expensive. :)
Currently, we store each file's dependencies as an expensive array. However, we don't typically access many file dependencies, so it's not time well-spent.
I've modified the data structure to store the file dependencies as a much cheaper string (tab separated) that is deserialized to an array on-demand. No change to the public interface, but now we're not doing any unnecessary work.
I profiled both the full Jest run time and the read/persist time to ensure I had a clear view into the characteristics of this change. It's somewhere in the neighborhood of a 10%~ startup time improvement at FB, which is significant.
Benchmarking against a generated benchmark hash map of 300,000 files:
before--
read: 2,291ms
persist: 2,413ms
total: 4,704ms
after--
read: 1,852ms
persist: 1,539ms
total: 3,391ms
delta: 1,313ms
Test plan